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LI. INTRODUCTION 


The manual measurement and analysis of Signal parameters 
leading to identification of the signal source can require 
more time and manpower than results justify. Such a process 
may be made less costly by the application of computer tech- 
nology to meaSure the traditional parameters in a manner 
relatively free of individual bias and to aid in the analy- 
sis and identification procedures. An additional result of 
the computer application is the capability to calculate 
parameters unmeasSurable by manual techniques. This thesis 
investigates the process by which such parameters might be 
obtained and used in an existing pattern recognition scheme. 
The discussion progresses from the general principles of 
Bayesian classification to the problem of finding features 
which are useful in silases erage Signals from a Specific 
data base. 

section If of this thesis deals with the theory of 
pattern recognition which underlies most automated classi- 
fication schemes. Section III discusses some of the limi- 
tation imposed on this theory in a practical signal identi- 
fication problem. Section IV is devoted to the problems 
which arise in the attempt to choose a set of features useful | 
and meaningful to the identification process. Section V 
addresses the particular problems of choosing a set of 


useful features based on the raster scan display of bauded 


a 





Signal transition times. Included in this section are 

the motivation for the initial choice of parameters and 

the techniques used for both measurement and analysis. The 
conclusions reached by this investigator, as well as those 


areas deserving further study, are Summarized in Section VI. 
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ieee Olan ead TERN ClnoolLPiTCAlT ION 


The process of mathematical pattern recognition forma- 
lizes the methods by which the meaSured parameters or fea- 
tures of a group of samples may be used to identify those 
samples as belonging to one of several general classes. 

Such a process may be considered to be composed of two 
parts. The first part is a learning procedure in which the 
general "pattern" of each of the various classes is des- 
cribed in terms of the Similarities and differences in the 
Sample features. The second part is a decision process in 
which a feature-based algorithm is developed from the pattern 
Gescription and applied to the identification or classifi- 
cation of new samples. Both parts of the process are based 
on the theory of statistical inference and lead to a pro- 
babilistic interpretation of the class membership. Closely 
related to Such an interpretation is the Bayesian model of 
decision theory. 

In the BayeSian model, the first step of a pattern recog- 
nition scheme, the learning process, is that technique by 
which a-priori class-conditional feature probabilities are 
determined. The K,-individual d-dimensional feature vectors 


yt) (1) vii) 


X, 1 X roseer Xy of the mn distinct classes Cc. are 


a 
used to estimate m class conditional probability density 


functions on the feature space 


We 





xs (eS) (1) 
p(x|C,) > p(x|x," ... XK) (II-1) 
Knowing these distributions for each class and the overall 


class probabilities, p(Cc,), one may obtain the feature 


conditional class probabilities: 


P(C; )p(x|Cc,) 
p(c, ie) = 5) ee 


| 


p(C,)p(x|Cc,) 


i <a eZ) 


“Pie cua her 


Assuming an equal cost of misclassification, the Bayesian 
Peeus1lon rule chooses that class which maximizes the feature 


Senditional probabilities of Equation II-2. That is, if 
p(c; |x) > p(c, |x) for all 4 (II-3) 


Beemocm> le characterized by the teature vector x is assigned 
eerclass C.- 

The above technique generates a series of decision 
regions in the d-dimensional feature space. Once the a-priori 
probabilities are determined, the class selection is simply 
an m-valued function of the feature vector. This is essen- 
tially the goal of the pattern recognition process. 

The probabilistic Bayesian decision process discussed 


above 1S mathematically elegant and optimal in the sense 
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that if the a-priori probabilities are known, it minimizes 
the expected risk of misclassification. Unfortunately, 

its implementation suffers major difficulties in both the 
learning and the deciSion phases of a statistical pattern 


recognition process. 


A. PATTERN LEARNING 

In the learning portion of a pattern recognition problen, 
One normally deals with a finite number of samples which 
constitute a training set. TheSe samples are the basis from 
which the investigator is obliged to construct the necessary 
a-priori class-conditional probability distribution functions. 
Thus the well-defined prohability structure assumed by 
Bayes! decision rule is in reality a statistical inference 
based on the data obtained from the training set and the 
investigator's best guess of the functional form of this 
data. Dependent cn how much of the general form of the 
probability density function is presupposed, the estimation 
techniques are characterized aS parametric or non-parametric. 

Typically, parametric estimation assumes that the features 
of the training samples follow one of the well-known proba- 
bility distributions. The Samples of the training set are 
used to eStimate only a few essential parameters of this 
distribution. For example, one might use the training sam- 
ples to estimate the means and variances for an assumed 
multivariate normal distribution. The non-parametric tech- 


nique on the other hand Seeks to estimate directly the 
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probability density function as a superposition of "potential 
function" terms arising from each of the training set samples. 
The accuracy of the inferred density function is strongly 
dependent on the type of eStimation technique used, and may 
be severely limited by the size of the training set. The 
applicability and limitations of both non-parametric and 
parametric estimation are discussed further in Appendix A. 
Regardless of the type of estimation, the process involved 
1S an example of supervised learning. It presuppoSes a 
Perron cOrrectly labeled samples from which the estimates 
may be constructed. Information external to the features 
being measured has been used to identify the members of 


each class. 


Bee DECLS ION RULES 

The previous discussion of the BayeSian approach to 
Signal classification already Suggests one method of ob- 
taining a decision rule. For each new (unclassified) 
feature vector use the a-priori probability inferred from 
the taining set to calculate the feature-conditional pro- 
bability of its membership in each class; then choose that 
Class for which this probability is-greatest. 

Such a maximal likelihood criteria is intuitively 
Satisfying and can be formulated to lead to classifications 
which are optimal in the sense of minimizing the average 
risk of misclassification. Unfortunately a naive implemen- 


tation of such a technique leads to computational inefficiencies 
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which often render the process impractical in terms of 
memory and time constraints. 

More frequently the decision rule chosen in a practical 
pattern recognition problem 1s suboptimal in the Bayesian 
sense, providing a method of partitioning the feature-space 
of the samples without resorting to the calculation of the 
maximal likelihood criteria. These suboptimal schemes 
use Simplifying assumptions regarding the nature of the 
underlying claSs-conditional probability ieeeipacions LO 
arrive at class boundaries which are relatively simple func- 
tions of the features themSelives. If there is little overlap 
in the various class~-conditional probability density func-~ 
tions, the increase in the expected misclassification rate 
as a result of the suboptimal decision is minimal. 

For example, consider the case of the one-dimensional 
Gistributions for the two equiprobable classes shown in 
Figures la and lb. Suppose that some suboptimal decision 


rule results in decision boundary 
B' = 8 = 5 (IT-4) 


The shaded areas of Figures la and ]b indicate the resultant 
misclassification rates. 

Pie theme ls Substantial Separation Of the class distri- 
butions, the differential error in classification as a result 


of boundary misplacement, area D, is negligible. In general, 


Ly 








(b) Small Overlap 


FIGURE 1. Misclassification Rates Due ot Suboptimal 
Decision Boundaries 
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with decreasing overlap of unimodal class distributions, 

not only does the total misclassification error decrease, 
but so does its sensitivity to the placement of the decision 
boundary. As a practical matter, the distinction between 

a five percent and a two percent misclassification rate 
seems unneccessarily fine if the probability distributions 
were Originally inferred from , 2s each. 

In many cases the sub-optimal classification schemes 
follow rather naturally from the method used to infer the 
a-priori distributions. Unimodal feature distributions 
give rise to classification schemes baSed on a test sample's 
minimal "distance" from estimated class means. More complex 
feature distributions involve the use of a distance measure 
in the feature space to establish the K training set members 
closest to the test sample; but classification based on the 
weighted vote of these K-nearest neighbors, seems closer in 
Spirit to non-parametric probability estimation. Both 
these classification methods are explored more fully in 


Appendix B. 
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Tetelie. PRAGLEGAbeCONoLDERATLONS INTHE DIDENTIELCAT ION 
OF NON-COOPERATIVE SIGNALS 

The foregoing discussion of the basic techniques of 
mathematical pattern recognition is intended to serve as 
an Outline of approaches to the general problems of classi- 
Bication. Identification of the originator of a bauded 
Signal, while bearing considerable similarity to Bayesian 
@lassification, is complicated by a number.of factors not 
normally arising an such problems as the identification of 
handwritten characters. 

In many of the classical problems of mathematical pat- 
tern recognition, samples are obtained from a statistical 
universe which is well-defined in the sense that the number 
of classes to which the samples may belong is finite and 
previously ascertained. Additionally, the parameters or 
features used to typify these classes are normally time~- 
invariant. The set used for training 1s accurately classi- 
fied and sufficiently large that meaningful statistical 
estimates of the parameters and their distributions may 
be obtained. 

Under these conditions, the processes of Section I 
find their greateSt application. Unfortunately, they are 
seldom encountered in identifying the signals of a non- 
cooperative originator. The data base used for a training 
set does not normally provide arbitrarily accurate ground 


truth aed On om line practioe, even the number Of 
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actual originator classes may not be known. In many cases, 
the paucity of signal intercept precludes more than the 
grossest statistical parameter estimation. The parameters 
themselves are time-dependent Since they are affected by 
component aging, time-dependent signal propagation condi- 
tions, and unknown external maintenance and adjustment. 

The advantages of long-term averaging to obtain statistical 
convergence, 1.e. accurate a-priori estimates, may be 
Obscured by short-term perturbations for which adequate 
correction may be prohibitively costly or undefinable. 

In summary, the investigator propoSing to uSe pattern 
recognition techniques to identify the originator of hostile 
Signals is faced with a series of constraints involving 
small sample size, time-varying Signal parameters and class 
probabilities, inaccurate ground truth, and an indeterminate 
number of classes with which signals may be associated. 

Such constraints impose severe limitations upon the pattern 
recognition techniques which may be successfully applied to 
the problem. The following sections indicate some of the 


procedures which deal with these limitations. 


A. SMALL SAMPLE SIZE 

The difficulties arising from small sample Size are most 
immediately apparent when one attempts to increase the number 
of features in order to facilitate the separation of classes. 
The dominant effect is one of inaccuracy ariSing from under- 


Sampling the required d-dimensional a-priori probability 
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Gistributions. When this undersampling occurs, the most 
likely result is that increasing the number of features be- 
comes counterproductive to "good" classification. In the 
Practical multi-category problem, it is not unusual to 
encounter an initial set as many as fifty features which 
the designer believes to be useful in identification. The 
number of taining samples is often too small to allow 
meaningful Parzen window or other non~parametric density 
estimation techniques over this number of dimensions. 

An alternative procedure of characterizing the a-priori 
distribution function by use of the most general multi- 
variate normal form requires estimation of a "dq x @" non- 
Singular covariance matrix. This imposes an algebraic re- 
quirement of at least d+l samples. However, experience in 
the practical estimation of covariance matrices suggests 
that at least two to three times this minimum algehraic 
requirement of d+l samples is preferable [7, ll]. Frequent- 
ly the number of available tra@ining samples for such complete 
estimation of the individual class covariance matrices is 
inadequate. 

Two common methods of correcting this difficulty involve 
somewhat arbitrary assumptions about the nature of the class 
covariance matrices. One possible assumption is that all 
classes share the same covariance matrix Sy yhich is then 
Obtained by averaging the individual class covariance 


matrices S.: 
wy 


hfe 








Ke aL 6 . * e 
_ it = i Ctl Gy ar seel Gd peel ep 
Sees = (Ky aa) sie (Xe Bt ) ( eel) 
io ae a li 


A second method assumes statistical independence of the 
features and makes all off-diagonal elements zero, regard- 
less of evidence to the contrary. This approach preserves 
additional detail about individual class structures at the 
expense of disregarding highly correlated features. It 
requires only Ener calcuiecion of a single feature class 
means and variances, aS indicated in Equations A-1l and A-2. 
Although such assumptions are almost surely incorrect, they 
often lead to better classifier performance than a maximum 
likelihood estimator [7]. 

Small sample sie also leads to complications when one 
attempts to estimate the error rate of the classification 
algorithm finally adopted. If one chooses to eStimate the 
classifier's error rate from the assumed parametric model, 
one finds the result iS optimistic to the extent that the 
training samples are peculiar and unrepresentative [21]. 

An empirical approach to determining error rate avoids 
this problem by running the classifier on a set of test 
samples. Where one is faced with a small number of available 
Signals for which ground truth is accurate, the confidence 
which may be placed in the results of the empirical scheme 


1s marginal. For example, if two errors are made in ten 
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test samples, one may predict with 95% confidence that the 
true error rate of the classifier lies somewhere between 

five and fifty-five percent. The figure of ten test sSam- 
ples is not unrealistic if one uses different design and 

test samples to avoid the hazards of "testing on the training 


Gata." 


B. INDETERMINANCE OF CLASSES 

The absence of a predetermined number of classes presents 
additional problems which must be addressed in any practical 
identification scheme. The first problem is that of estab- 
lishing the criteria for excluding a given sample as a member 
of any of the previously determined classes. The second 
problem is one of aSsociating or "clustering" the resultant 
"unclassified" signals into possible new classes. 

Within the formal structure of minimal risk Bayesian 
classification, one may introduce the concept of an addi- 
tional class which corresponds to the heuristic category 
eye Gon'’t know." More frequently, this category is inter- 
preted as that portion of the feature space for which no 
feature-conditional class probability exceeds a given 


threshold. That is: 
p(C,|x) < 2 Onc eI at (III-2) 


An alternative method of establishing this threshold 


for multivariate normal distributions is to specify a 
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maximum Mahalanobis distance from the established class 


means. 


ee = (xu li)? ghee y 


; x ) > A (FE i=3) 


as the boundary for which a sample will be included in a 
given class. 

The second problem, that of clustering the "unclassified" 
Signals, is less well defined from the standpoint of Bayesian 
analysis. It represents one of a group of problems known 
as unsupervised learning. Jarvis and Patrick [12] present 
several techniques by which such clustering may be performed 
and illustrate the advantages in graphically displaying 
the clustering process. 

One such clustering technique used in several identifi- 
cation systems is illustrated for a two-dimensional case 
in Figure 2. Thresholds based on Mahalanobis distance are 
established for each cluster. If the Mahalanobis distance 


fOr a given sample, such as S exceeds the outer threshold 


1’ 
distance from the mean of any known cluster, that sample 
is considered to represent a new cluster and possibly ea new 
Signal originator. For this new mieten, arbititary, dis 
tance thresholds based on "typical" class covariance are 
assumed. Subsequent samples (S.) lying within the inner 
threshold are used to update the class statistics; those 


samples (S.) lying between the two thresholds are tentatively 


associated with the class, but do not update the class statistics. 





“NEW” CLUSTER 








REJECT at 
THRESHOLD 





PEGURE 2. Guang Zone Clustering 
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It is during the clustering procedure that the man- 
machine interface becomes a critical factor in the pattern 
recognition process. An equally important factor in the 
efficiency of the cluStering algorithms is a technique for 
feature selection which provides an adequate basis for 
representing new classes. The initial investigator has the 
obligation to use a training set which as nearly as 
posSible represents the entire Spectrum of possible feature 


parameters. 


Gee -NACCURATE GROUND TRUTH 

The difficulty of obtaining accurate correspondence 
between the elements of the initial training set and known 
Signal originators represents the most persistent and per- 
plexing dilemma in the design of pattern recognition devices. 
Those Signals for which there is a high degree of probability 
that the originator can be determined from external infor- 
mation may not adequately represent the entire spectrum 
Ge Signals. Conversely, the originators of a more 
representative group of signals may not be known. 

One may attempt to choose self consistent classes 
through the use of a clustering algorithm and the applica- 
mion Of probability estimator to information confirming 
class membership, a technique best described as "learning 
With a probabilistic teacher." This approach, discussed 
more fully by Cooper [4], is still rather exploratory in 
nature and is implemented at considerable cost in timeliness 


and computational efficiency. 


a 





Since the empirical methods by which any new classifi- 
cation performance is evaluated must uSe as ground truth 
the identifications of an older system, indications of any 
improved performance in the new syStem are perforce intui- 
tive. This is the crux of the circular dilemma which arises 
Wien One is forced to estimate not probability of error, but 
probability of disagreement. In the event of disagreement, 


which system is right? 
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iy eee ee, on leECTION LM WAVEFORMSANALYSIS 


One important question regarding the process of waveform 
identification has not yet been addressed. All of the tech- 
nigues described so far have indicated that one commences 
this process with a particular set of feature measurements. 
In the case of a waveform z(t), a sufficiently general 
feature vector might be the sequential samples of the 
Gentinuous Signal, i.e. Cee ARES) coca oa ie If these 
Samples occur at greater than the Nyquist rate, one may | 
reasonably assume that they constitute an adequate descrip- 
wren OL the Signal. One could then proceed directly to 
the calculation of n-dimensional probability distributions 
bememcach class, blithely oblivious to the fact that n 
Meyvewe On the order of 500 or more. 

In light of the advantages in the choice of a small number 
of mutually-~independent well-separated features for the 
process of pattern identification, it is normally advisable 
to seek some mapping, F, of the Signal space Z into a 
pattern space X. That is, we wish to find a preprocessing 
transformation so that the pattern space X = F(Z) satisfies 
the following objectives: 

(1) low dimensionality 
(2) retention of sufficient information 
(3) enhancement of distance in pattern space aS a measure 


of the similarity of physical patterns 
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(4) comparability of features among samples. 
heue that Lb and 2 amply elimination of redundant information. 

In addition to the above objectives one might also wish 
to obtain a pattern Space representation adequate for the 
construction of new classeS in an unsupervised learning 
Peden lOuEniS eGndge itt oss convenient thatesthe individual 
components of the pattern Space have some natural interpre- 
tation which might provide qualitative information about 
the underlying causes of class difference. 

As a first step toward eliminating redundant informa- 
miom in a pattern Space satisfying the above criteria, it 
1s frequently desirable to represent the sampled waveform 
as a linear combination of orthonormal functions. Two 
commonly used orthonormal expansions satisfying slightly 
different training set mean square error criteria are dis- 
cussed in Appendix C. The coefficients of these functions 
may then be used as features of the signal. The choice of 
orthonormal functions used in the expansion however, need 
not be one which necessarily minimizes a form of mean 
square error in the representation of the signal. If one 
is willing to accept a slightly greater amount of error in 
modeling the training set waveforms, then one possible 
technique 1s to use a set Cavopvenononsnal. funceons for 
which it 1S computationally efficient, as it is with the 
fast Fourier transform, to obtain coefficients. By trun- 
cating the number of coefficients one may achieve some 


mecauction Of dimensionality. 
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The features derived from the orthonormal projections 
of the sampled signal described in Appendix C represent 
linear transformations of the sampled waveform data. For 
Bie Purposes Of Signal identification, this representation 
may not be adequate. 

panece Ome Of the goals in this application of pattern 
recognition techniques is to assist a human analyst in 
ioe? an identification, it 1S particularly desirable that 
the reSultant features derived from preprocessing have some 
degree of natural interpretation. The optimal linear trans~ 
formations discussed cdo not always lend themSelves to easy 
extension in an unSupervised learning rode. 

Features which are more directly related to an assumed 
model of the signal generating process and which are often 
easily implemented as a measurement procedure, may in fact 
involve non-linear transformations of the waveform data. [In 
selecting features of this type, the prior experience of 
the data analyst and the details of the signal generating 
model serve as a focus for the development of non-linear 
preprocessing. 

Quite commonly, the initial feature space of the signal 
will include components derived from readily interpretable 
data transformations, both linear and non-linear in nature. 
This feature set 1S normally too large for convenient compu- 
tation and some of the features thus obtained may be depen- 


dent. The techniques of linear dimensionality reduction 
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described in Appendix C may be applied to this set of 
features, particularly when graphical display of the indi- 
vidual signal is required. In order to provide a feature 
set with dimensionality low enough to insure computationally 
convenient classification and general enough to allow the 
establishment of new class clusters, it is both sufficient 
and deSirable to reduce dimensionality while retaining the 
identity of individual features by Selecting those n of 

m features which lead to the optimal separation of classes. 
The selection process then requires both the adoption of a 
criterion by which class-Separability may be estimated and 
the development of a feature search algorithm. Both of 


these topics are discussed in Appendix C. 
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Vee TER MAPPINGSOF RASTER SCAN, DISPLAYS 


As a practical application cf the signal classification 
techniques previously described, the author conducted a 
seriesof investigations which used as a data baSe the raster 
Scan displays of bauded, fregquency~shift-keyed (FSK) sig- 
nals transmitted by several originators. The signal data 
base, many of the associated processing algorithms and the 
processing equipment itself were inade available through the 
cooperation of Electromagnetic Systems Laboratories in 
Sunnyvale California. The data base and Supporting facili-~ 
ties are the results of the company's development of the 
Parameter Encoder in a Navy-sponsored prograin for signal 
measurement and identification. 

The objective of the author's lnvestigation was to iso- 
late from the raster Scan data a small set of features which 
might prove useful in automated signal identification. As 
fect Of the existing clustering and identification tech- 
nique, the raster scan pattern of a given Signal was pre- 
sented to the system operator for visual comparison with 
MeeeetiswOf previously identified Signals. It was claimed 
that such comparison was of value in the final stages of 
the identification process, when the classification of the 
Signal in question had been narrowed to a few possibilities. 


Display of the information necessary to preduce the required 


number of raster scan patterns, however, consumes a Significant 
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amount of computer time. The interests of improved effi- 
ciency, coupled with the author's desire to quantify a 
process open to considerable latitude in operator judgment, 
Meevrded the motivation for the project. 

In obtaining a reasonable initial set of parameters for 
consideration by this thesis, a study of the signal generating 
process and the characteristics of the raster Scan display 
led to a slight modification of the normal minimal phase 
ieeter representation. The resulting "nhase-~unwrapped" 
raster display was the basis of a series of measurements 
related to clockrate, transient phenomena, and possible | 
data-dependent or unintentional external rate modulation. 

A model of the signal process, the raster Scan display, and 
the methods used to measure the initial set of features 


are Ciscussed in the following sections. 


poe oO LGNAL GENERATION PROCESS 

Since the raster scan display 1s based esSentially on 
Vero Crossing information, it represents a non-linear trans- 
formation of the incident signal. It is convenient to refer 
to a model of the signal generating process in an attempt 
to justify use of such a transformation. From this model 
the qualitative effects of the various steps in the gencra- 
tion of the raster scan pattern can be estimated. Fig. (3) 
provides an outline of the process which leads to the raster 
scan display. All of the signal data base information avail- 


able for the feature measurements proposed by this thesis is 
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contained in the tranSition time measurements provided by 
the edge detector. 

Oe ell the Qualitative eGffrects upon transition time 
measurements, those associated with the originator's clock 
and bit-stream generator are potentially of greatest value 
to a raster-Scan-based identification scheme. Class-to- 
class differences in the basic clock rate are certainly of 
Miterest, as are clock rate variations due to component 
instability and unintentional external rate modulation. 
ihtemeie Originator's clock is not continuously running, 
transient phenomena associated with clock turn-on may be 
apparent in the raster display. The binary hit stream 
generator (data modulator) and non-linearities in the FSK 
modulator itself may produce an unequal mark~-space duty 
cycle or data dependent mark-space asymmetries which appear 
as bias in the raster display. Such effects should produce 
features of the raster display which Serve to characterize 


solate them effectively, however, may 


— 


EmewOriginator. To 
require the use of non-linear waveform data transformations. 
Other parts of the signal generating process may intro- 
duce qualitative effects which tend to obscure those produced 
by the source. For example, noise and Signal attenuation 
in the propagation path may introduce FM spikes and anoma- 
lous transition times. Non-linearities and zero-level off- 
fee in the FM discriminator may produce additional bias in 


the measured mark and space durations. The waveform sampling 
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procedure itself introduces an unavoidable time quantization 


error which may or may not be uniformly distributed on a 


emort term basis. If this error 1s uniformly distributed, 
one can expect a transition time error variance Onn where 
At 
Peete =) 
On 13 Ved 


and where At is the sampling period. 


PeeeerAGSTER SCAN DISPLAY 

The use of a raSter Scan display arises rather naturally 
in the attempt to represent the fine details of clock rate 
variation. The axes of the display correspond to coarse and 
fine divisions of time in the following manner. Suppose 
some event, such as a mark-Space transition, 1S asSumed to 
occur only at times characterized by some integer multiple 
of a nominal period T. The actual time of occurence, t, may 


then be represented by 


ae ee ent eger V-2 


By equating the horizontal axzs of the raster display to 
femactual time variable, t, and the vertical axis to the 
fine time z, any arbitrary time may be represented as a 


point on the sawtooth waveform indicated in Figure 4b. The 
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raster display has the effect of mapping the points t. 
in a one-dimensional region into the points (a(t.),t;) 
of a two-dimensional region. Since the fine time variable, 
z, represents a lag or lead relative to the nominal period, 
one can relate the points of the raster scan display to 
the samples of some continuous phase perturbation. 

For example, consider the continuous phase function 
illustrated in Figure 4a. The function is comprised of a 


iemedar Dart and a perturbation 
a a: 
Ge) = 2r(—] t + zlt)} V-3 


where T is the nominal period of the raster Scan display 


at times t.. If samples of this function are taken when 
o(t,) = 2nT Nae. |) 2 ee V-4 
ang if 
PGi eh ee tom ott V5 
then the minimal time difference from the expected sample 
time 1S proportional to the phase perturbation. The raster 
scan points of Figure 4b represent non-uniform samples of 


this continuous perturbation. In this sense one can repre- 


sent the events occuring at times (ty t, ...) as Samples 
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(z(t,) Z(t.) eee somcontinuonsc: piase 


z(t) where 


g({t_) =t o--n ip 
ie) n max 
where 
t 
n ‘L 
= aa ae 
max ls 5! 
ea 
i.e. the greatest integer less than a 


Figures 5a through 5d illustrate the 


of events t. of arbitrary starting phase 


different from the nominal period of the 


the nominal raster period coincides with 


Mayas in Figures 5a and 5b, the slope 


S 


Meoants Of the raster pattern is zero. S 


Meemodicity of the series of events diff 
the nominal raster period by a smali amo 
pattern may be characterized 


Dy: 


a E 
t. oe at ee 
as shown in Figures 5c and 5d. 


PEOPOLr elonal, functe ton 
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raster patterns 

and periodicity 

master Scan. ff 

the actual pericdic- 
of the line joining 
tO iavey ef ene 
ers Sliochtly from 


unt ¢, the raster 


by a line whose slope is given 


In the signal display of the Parameter Encoder both 


mMark-Sspace and space~mark transition times are plotted on a 
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raster scan whose nominal period is approximately one-half 
the time for a single mark~-space cycle, that is raster 
period is equal to the baud length. The alternate points 

of the raster scan exhibit a bias term if the mark and 

Space durations are not equal. If the period of the entire 
mark=-space cycle remains constant, lines connecting alter- 
nate points of the raster scan have the same slope, but they 
are vertically offset by the difference in mark and space 
Oieeeerons. Figure 6 illustrates this condition. 

The effect of the uniform sampling of the waveform and 
the subsequent time quantization of events is apparent when 
the nominal period of the raster scan is on the order of 
ten times that of the sampling period. Figures 7 and 8 
illustrate the effect of quantization noise in two raster 
scan patterns of a slowly varying Signal, one of which 
uses a nominal period that iS 1S an integer multiple of 
the sampling time. This quantization phenomena was evident 
in the raster displays of the signal data base used in this 
thesis Since the nominal baud length is.on the oxder of 
twenty times the sampling period. 

ghe abrupt transition in Figures 7 and 8 also Serves to 
illustrate the “aliasing" effect which may occur in a minimal 
phase repreSentation if the time difference between the 
actual and the nominal event exceeds one-half the nominal 
period. When this aliasing occurs the points plotted on 


the raster scan are no longer proportional to the assumed 
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continuous perturbation. In order to maintain proportionality 
it is preferable to represent the time of events by a minimal 
phase different plot. Such a procedure performs a certain 
amount of "phase unwrapping" by assuming that the absolute 
raster time difference between two successive points never 
should exceed T/2. Figure 2? illustrates the effect of this 
minimal phase difference algorithm. Since each segment of 
the sawtooth pattern raster scan may be extended in either 
forward or backward in time the minimal phase difference 
representation uses this extension to represent those points 
for which the minimal phase difference is less than T/2. 
ierigure 9, those points occurring in the heavily lined 
region indicate the normal raster scan pattern while a 
continuous line joins the points of the minimal phase 
difference pattern. Since some of the “aliasing" associated 
difficulties in the measurement of features may be eliminated 
by use of a minimel phase difference, this representation 

was included as the initial step in the measurement process. 
Figure 26 outlines the flow of the algorithm actually used 


in the measurement program. 


fee kASTER SCAN DESCRIPTIVE PARAMETERS 

The first efforts of the author's investigation of the 
raster scan data were directed towards obtaining a rather 
large set of descriptive parameters, particularly those 
which seemed useful in the quantitative representation of 


the effects anticipated from a study of the model signal 
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generating process. Manipulations of the transition time 
Gata tO Provide information about the nominal clock rate, 
it's transient variations and external modulations, as well 
as source-associated mark-space bias and data-dependent 
transition time anomalies were the areas of particular 
interest. The parameters considered included representa- 
tions of all or part of the sSignal's raster phase by poly- 
nomial mean square fits, Laguerre polynomials, and Fourier 
power spectrum terms, as well as measures of mean bias, 
intrinsic and total signal variance. The techniques 
employed to calculate these parameters are described in 
the following sections. 
mm caneslas and the Bifect Or Transients 

Although several sources of apparent mark-space bias 
other than those associated with the criginator have been 
indicated in the study of the process model, the measure- 
ment of this parameter may provide an adequate estimate of 
the source~associated component. More recent studies of 
new signal data for which the bias effects not associated 
with the originator are believed to be negligible, indicate 
that such source-dependent bias not only exists but can in 
fact be time-varying. 

The simplest estimate of mean bias may be obtained 
by the straightforward averaging of the raster phase differ- 
ence of alternate points on the raster scan. For the raster 


scan display of the alternate positive and negative transitions, 
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given by: 


P = 
i=) 
n n 
aa el ra ee 
ZT i=l 
=o - 3 V-8 


which is the difference in the mean raster phase of the two 
bias conditions. 

One difficulty which arises from this representation 
of mean bias occurs as a result of clock transient behavior. 
Tf at any point t,, raster phase z(t;) Giffers from the 
raster phase of the starting point by an amount approxi- 
mately equal to one-half the nominal raster period, the 
cyclic representaticn (minimal phase) of the raster displey 
introduces an "aliasing" error into the mean bias estimate 
of Equation V-8. 

To eliminate such errors in the bias eStimate, one 
must use the minimal phase difference representation discussed 
previously for each bias condition. 

If the transient behavior of the clock rate is suffi- 
Ciently violent, even the minimal phase representation may 


not be adequate. For example, assume that the inStantaneous 


49 





Sraginator clock frequency f, is characterized by turn-on 


from a dead stop: 
f = fea tt Semi aik)) v=o 


were a iS the reciprocal of turn-on constant, t. The 
instantaneous phase ¢(t) in cyclesS is obtained by integrating 


Meecom 0 to t: 
t 5s 
OAR) ee YG eet ee [ae fees) ie) | Mie 


Mapeeansitions occur when ¢(t) = 0,1,2,..., (Figure 10a), 
Paeeraster Scan points of Figure 10b are obtained. It should 
be noted that the transition times are non-uniform samples 

of the actual phase. Similar computations may be performed 
for the raster phase of a clock whose transient frequency 
behavior is characterized by a shift from a free running 


frequency, foe to the steady state frequency, fas’ 


f = f exp (-at) + f [l - exp (-at)] 
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$(t) = —S—— =" [exp (-at) -1] + f£,, ¢ Veale 


The above results sucgest that the raster-phase 


pattern or a minimal phase difference pattern may be 
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FIGURE 10. Relation of Exponential Transient to 
Raster Display 





extremely misleading during the transient period. It is 
evident that the predicted raster phase (in cycles) may 
differ from the actual steady state phase by an amount 


equal to 


which may be considerably greater than one cycle. For 
example, 1f T. = .9t and t = 20 T __, then difference in 

O ss ss : 
steady state phase may be more than two complete cycles. 
Using the same time constant tT, turn-on from a dead stop 
(£0 = 0) results in a Steady state phase difference of 
twenty cycles (from nominal). The raster scan display of 
Figure 11 serves to illustrate the transient turn-on effects 


Sieeernved when t = 10 T. In such a case the anti-aliasing 
pee ] 


ae 
capability of the phase unwrapping process is severely 
strained and it would be wise to avoid measurement of mean 
bias in this transient region. 
fee ectern Variance and Intrinsic Variance 

Pattern variance and intrinsic variance measure 
the extent to which the raster pattern is explained by the 
MOminal clock rate and the minimal phase difference represen- 
tation. Large values of total pattern variance indicate 
that the nominal raster chosen does not completely explain 


the pattern. Large values of the intrinSic variance suggest 


that a minimal phase difference representation is not 
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adequate, perhaps as a result of violent tranSients or 
noise. 


These two measures are averaged over points of both 
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bias conditions z emir Zz corresponding to the alternate 
positive and negative going transitions. The total variance 
Vip is obtained from the following calculation: 
2 NG met 
2 tL (i) 2 i els) 402 
Vv, = t {== ae “F or cae 2 eae Vor 
T wor NOT 521 75 N, (W,~T) eq 3 
Intrinsic variance (v,) -is estimated by: 
Lie 2 
2 1 ; 
vy" ae; = 5 (2+) ee) 2 
Sil Osage ak 2k=] 
N. ear: 
eee et ee (1) V~15 
ao a j see 
ON 
1=1 1 jr 
where ‘i refers to the jth point of the set of either 


Pesitive Or negative going transitions. 

These two terms may be particularly uscful as a 
measure of signal @Gegradation. They may be used as a measure 
of non-parametric correlation since the fraction of total 


variance "explained" by the intrinsic variance 1s given by: 


A(z) = 1 - 
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One eStimate of non-parametric correlation is: 


B(z) = \/A(z) : V-17 


Smail values of B(z) suggest that the apparent behavior of 
the raster pattern 1S not well characterized by a relatively 
smooth function of time. 

3. Polynomial Mean Square Fits 

In the hope of quantifying raster pattern behavior 
which system operators had described as "curve up," "curve 
down,” flat," or complex," a fourth order polynomial mean 
square fit to the minimal phase difference raster data over 
the entire signal period was attempted for each bias condi- 
meme The fFOurth order fit was initially believed to be 
capable of representing both transient and steady state 
phenomena. Unfortunately, even the minimal phase difference 
representation of the data could not eliminate aliasing 
problems in those displays where apparently large initial 
transients occurred. AS an alternative, a second order fit 
Over that data lying outside the region of worst transient 
behavior was eventually incorporated into the measurement 
program. 

The coefficients of the second order fit also serve 
Mmerpurpose Of reducing the sensitivity of the Fourier trans- 
form data to phase offset and endpoint discontinuities which 
result from minor adjustments in the nominal raster period 


of the display. 


1 
wa 





As an G@xample, consider the periodic Fourier Series 
transform of a pattern characterized by a line with non-zero 
Slope. As a result of endpoint discontinuity in a periodic 
representation, the odd Fourier series components exhibit 
amplitudes proportional to the magnitude of the discontinuity 
and decreasing in frequency as a, 

It has previously been established that transient 
phenomena, zero point offset, anc differences between the 
nominal and actual steady state period can produce patterns 
with both offset and non-zero:’Slopes. If the raster pattern 
Phews Sionificant curvature, it is difficult to sustify any 
particular choice for the nominal] period. Any choice of 
nominal period creates its own contributions to the Fourier 
series components. Since the nominal raster period of the 
Gisplay is generally the result of an automatic measurement 
procedure, and Since it represents one of the Signal features 
currently used by the system's identification algorithm, the 
interests of comparability were best served by partially 
correcting for these discontinuities in terms of the mean 
square fit coefficients. Only the difference between the 
raster pattern and the polynomial mean square fit to each 
bias condition was used as input to the fast Fourier transform 
calculation. 

4, Fourier Power Spectrum 
imemcettOuo@oser ve the effects of possible uninten- 


Bronal external modulation and periodic variation of the clock 





rate, the use of Fourier power spectrum components was pro- 
posed. The calculation of these terms presented an intriguing 
problem, since the minimal phase raster data represent non- 
uniform time samples of a presumably continuous Signal. The 
non-uniformity of sampling time arises from both the clock 
rate variation and the fact that the transition times them- 
selves and data modulated. The presence of one transition 

per baud is not the case; transitions occur at various 

Pmeecger numbers of originator clock cycles. 

The problem of non-uniform sampling is not too Seri- 
ous if one intends to obtain only a few Fourler power series 
components. The techniques for estimating amplitude phase 
and frequency of the principal Fourier components has been a 
topic in recent literature [22]. Cne can easily construct 
Fourier series of terms which are orthogonal on the interval 
[0,L] by considering only the frequency terms which are 


Periodic in L. 


g(t) = Ce shee V-18 
n 
n=-© 
These complex components 
ie io t 
c = f z(t)e dt V-19 


may be approximated step-wise 
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or by numerical integration involving higher order interpo- 
lation. Although numerical integration of the above type is 
possible, all frequency components up to the nominal clock 
frequency were potentially of interest as features. An 
alternate form of calculating the magnitudes of the power 
spectral components, using the fast Fourier transform was 
adopted for computer implementation. 

For each bias condition, this procedure typically 
used about 75 non-uniformly spaced raster Cata points obtained 
Simside the tranSient region. For each condition, 128 
sample points uniformly spaced over the same time interval 
were chosen. The raSter phase at these points was obtained 
by an interpolation procedure which performed a cubic poly~ 
nomial fit to four minimal phase difference raster points, 
two on either side of the interpolation point. Since the 
fast Fourier transform routine used accepts complex values 
as input, the interpolated functions for positive going and 
negative going transitions were stored as the respective 
real and imaginary parts of this input. The complex input 


function may be expressed as: 
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where the nth Fourier component of the real functions f and 


g are: 


ry 
rh 
N 


a. + 1b 


come) ee cere. 5 V-22 


The linearity of the Fourier transform and the requirement 


meat £LOr a real function 


F  (f) = Ba (f) VEZ 3 


om 


implies that the nth Fourier transform components of both 


a, 


functions £ and g may be recovered from (Ea) and ae 
the real and imaginary Fourier components of the complex 
input [10]. Additionally it can be shown that: 


[Re[F (y)1]7 + (ImtF,(y)]1? + Re(P, 2 12 


= 2(a,° + b,* + a,° + b,° 
Mae Fourier transform information uSed as features in this 
thesis was the squared magnitude of the frequency components 
Summed over both bias conditions in the manner shown above. 
The Figures 12-22 show the behavior of the time 


Quantization interval, the minimal phase difference interpola- 


mon, the second order polynomial fit, and the Fourier transform 


oe, 





features on a Signal whose possible transition times ce 


are given by 
t. = (nT + Asin 27TfnT + B exp(-anT) ) V-25 


The data pointS were selected from these possible transition 
times by a simulated data modulation which repeated itself 
about ae times in the duration of the display. In the ras- 
ter patterns the values of the second order polynomial mean 
Square fit and the interpolated raster phase for one bias 
condition at the 128 points used for interpolation are shown 
as the characters (+) and (*) respectively. 

The effect of the quantization interval 1S apparent 
in all raster displays, but it is encouraging to note that 
a 25 Hz frequency component of amplitude equal to one-half 
the quantization interval (Figure 20) may be detected even 
in the presence of a tranSient during the initial half second 
of the Signal sample. This detection ability, however, is 
strongly dependent on the apparently uniform distribution of 
Peamscition times within the quantization interval. For this 
deterministic case, as the raster pericd approaches an integer 
multiple of the quantization interval, the ability of the 
measurement technique to isolate a Signal of small amplitude 
aS a Single maxima deteriorates. 

5. Laguerre Polynomial Coefficients 
In the attempt to characterize the apparent transient 


behavior of some of the raster displays, a set of Laguerre 
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polynomial coefficients were calculated over all the points 
z'(t.) of Lhe anit alpen tlonsetrethe Signa vewe lhe “coefficients 
L. of the first five Laguerre polynomials were obtained by 
stepwise numerical integration over the first 800 quantization 


mieervals. 


L, = i z'(t.) exp[-pt.] (At.) 
0 =. <OUU = = = 
L:1— 
i — FC octet | exp l—pt. | (At) 
l +. <800 aL a 7 1 
i 
0, 
As vy A a = 
Lo . a. (t,) [2(pt,) 4pt,tl] expl pt,] (At, ) 
Jee 
i — y Ce een SG epee exp[-pt.] (At.) 
3} te. <800 1 ge aL iL aL : i: a 
i 
= ' 2 age Sh 3 ans - wy = 
Ly : Pa (t,) [y(ot,) x (et, ) tl2pt,-8pt,+1] expl pt. | (At;) 
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During the transient period, the aliaSing effect, 
which occurs if the nominal raster period does not match 
the actual period, is especially pronounced. The data may 
very well be undersampled in this region, thus no attempt was 
made to compute these coefficients independently for each 


bias condition and the phase unwrapping procedure was applied 


Ue 





to all data points. This process, however, operated on the 
premise that the phase difference from transition to transi- 
tion is not more than one-half the nominal raster phase. 
Even if both positive and negative tranSitions are used for 
phase unwrapping, this premise may not be valid. 

If the initial transition times predicted by the 
steady state period and knowledge of the signal's underlying 
Gata modulation can be determined, a more accurate represen- 
tation of transient phaSe might be obtained. Unfortunately, 
knowledge of this data modulation was not available for the 


purpose of this thesis. 


Pee COMPUTER FILES AND PROGRAMS 

The programs used to produce the feature measurements 
described in the previous Section were deSigned to operate 
ea pre-production model of the Parameter Encoder which 
incorporated in its hardware a Hewlett-Packard 2100A computer 
with twin disc drives and moving head disc operating system. 
The normal graphics display and list device was a Tektronix 
Storage scope. 

The programs discussed in this section, while written 
in HP Fortran, make use of the Fortran-callable executive 
routines available under the disc operating system. These 
routines permit program overlays and disc file input/output. 
memrlarly, a Pabrary of Fortran-callable utility routines 
developed at Electromagnetics Systems Laboratory to control 


graphics in put and output were used to create and erase the 


Cee 





scope displays, temporarily halt computations and provide 
keyboard control of the meaSurement process. 

The signal data base used as input to the measurement 
process was stored in the disc file, PARF. This file con- 
tained a signal index number, the automatically measured 
clock rate, the time difference between SuccesSive signal 
transitions and the total number of transition times 
measured (see Table 1). Since this data format differed 
Slightly from that normally accepted by the Standard raster 
display routine RASF, an interface program FACE was written 
EPemcenvert the data. 

Following the initial display of the raster Signal, 
meocess control could be transferred to a program FOURD 
whose function was to call the interpolation, measurement, 
and display subroutines and to store the measured parameters 
in the disc file PAR. The overall process flow of FACE, 
RASF and FOURD is shown in Figures 23 and 24. 

i ni 2vieh gemeyouc Vere Clears 

The first subroutine called by FOURD is RESDU (see 
Figures 25 and 26 for overall process flow) . This subrou- 
tine calculates a minimal phase difference representation 
Bole thne DOints of both bias conditions outside the transient 
region. The points thuS represented are used by the subrou- 
tine CURV to calculate the coefficients of the polynomial 
mean square fit. They are also used to calculate the signal's 


mean bias an@ its total and intrinsic variance. A cubic 
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Process Flow, Interface and Raster Display 
Programs FACE and RASF 
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FIGURE 24. Process Flow Measurement and Display 
Program FOURD 
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FIGURE 25. Flow Diagram Interpolation and Measurement 
Subroutine RESDU 
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PEGURE 26. finimal Phase Difference Algorithm 





interpolation scheme CURF then provides the uniform samples 
used by the fast Fourier transform subroutine. RESDU also 
calls the subroutine LAGER, which performs a stepwise numeri-~ 
Gal integration to estimate the coefficients of the first 
five Laguerre polynomials. 

The two other subroutines called by FOURD are FOUR2, 
a standard fast Fourier transform, and FORD, a routine which 
calculates the average square magnitude of the Fourier com- 
ponents and displays the results on the Parameter Encoder's 
storage scope. The disc file, PAR, which contains all the 
above feature measurements, 1S Organized aS shown in Table It. 

2. Statistics and Formatting Routines 

The analysis of the features measured by the routines 
indicated in FOURD makes extensive use of identification and 
analysis software previously developed for the Parameter En- 
coder. These procedures use a yet another disc file structure, 
MASTF (Table ITi),as their input data base. Additional refor- 
matting ana selection of the parameters was necessary to allow 
the use of this file Since it accepts a maximum of 50 
parameters. 

The program CoTAT provided class statistics for the 
measurements stored in MASTF. In addition to reformatting 
individual signal parameters, and storing them in the disc 
file MASTF, the subroutine TRNSG performed the important 
function of interpolating the existent Fourier magnitudes 
to estimate the square magnitude components at 2 Hz incre- 


ments. This was necessary since the resolution of each signal 
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Comment 
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1 £f used, 0 LE Not Usec 


Comment 


Bloatumg Point 
Floating Point 





Baeevariable., Ground truth classification of each signal 
had been previously determined through the use of parameters 
not included in this thesis, and was stored in a small Disc 


mole, CSIGF. 


E. ANALYSIS OF PARAMETERS 

From the basic group of features obtained by the measure- 
ment process, a group of 50 parameters was selected for further 
analysis. The analytic technigues used were those readily 
available to the user of Parameter Encoder identification 
software. Several iterations of meaSurement, observation, 
and feature analysis were conducted before arriving at the 
parameters indicated in Table IV. Earlier measurements had 
involved the use of fourth order polynomial mean square fits 
to the entire raster display, and the use of Fourier magni- 
tude coefficients of raster pattern input uncorrected by 
the polynomial fit. 

These earlier techniques appeared to provide reasonable 
measurements for approximately 60 percent of the signals 
considered, but failed catastrophically for the remaining 
Slonals. The major cause of failure appeared to be the ina- 
bility of the minimal phase difference representation to 
track the raster pattern through initial transients and 
anomalous transition times. The final measurement process 
discussed in the previous section was applicable to about 
80 percent of the signals attempted and resulted in measure- 
ments for the final training set of 74 signals distributed 


among nine classes as shown in Table V. 
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The ground truth identification of these signals was 
based on the agreement of two external identification 
schemes. One of these was the semi-automated process 
developed by Electromagnetic Systems Laboratory which uSes 
additional Signal parameters not associated with the raster 
display. The other process uses all-source information in 
Sieelying at Signal identifications. 

The analysis began with the measurement of Single fea- 
jure separating capability for each of the 50 selected 
parameters. The eleven features showing the greatest capa- 
bility were then examined for redundant features by calcula- 
tion of class and global correlation matrices. Finally, the 
error rate of a minimum distance classifier was used as the 
criteria for a feature search procedure which combined single 
feature ranking and search-without-replacement techniques. 
Since the amount of training data was limited, even this 
iterative procedure may place too great an emphasis on the 
training set values. 

imeooanglc Feature Separability 

The Single feature separability measure available 
in the Parameter Encoder software is a modification of the 
distance ratio techniques discussed in Appendix C.2. The 
program FINTR calculates the average Square distance between 


points of m different classes, 


2 > \%™ml om 1 ae (i)  (i).2 
a TS cee ee. (ee V=27 
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The m class average within-clasS-variance, 


m ‘Sy 3S 
Bees Beery tC ita) - xt)? V-28 
ie kk lea aja J 
and uses the ratio 
Dae 
F = a V-29 
Diy 


as a measure of separability. Since the array storage 
avallable in this program limits the number of input samples 
to 60, the signals of classes 1 and 8 were not used in the 
calculation of the feature distance ratios. The distance 
ratios obtained uSing all Signals of the other 7 classes 

are shown in Table VI. The distance ratios of the frequency 
components are also plotted in Figure 27. 

By way of comparison Figure 28 shows the distance ratio 
as a poet ion of frequency for data obtained in earlier 
investigations using a smaller number of Signals and classes. 
The well defined maxima in the latter case can be attributed 
to the fact that the Signals and classes uSer were particu- 
larly noise and transient free. In these casSes one was able 
to use the entire raster display uncorrected by a polynomial 
fit to estimate the Fourier components. 

If one assumes that the parameters are statistically inde- 
pendent, then the ranking of features according to distance 


ratio provides a guide for feature selection. Eliminating 


87 





TABLE VI 


Single Distance Ratios for Parameters 901 - 350 
i Ratio i! Ratio # Ratio 
901 On 59 921 Lape 94] 1 .Ae4 
902 te 6.0 O22 4 942 0S 
903 ive od 923 io 5 943 94 
904 OM 924 i 25 944 oe: 
905 JOY: 925 es 945 oo 
906 eke, a6 aos 946 64 
907 0 SPT Zo 947 4S) 5 
908 ies 928 leon 948 no 
909 joe ee, 929 LAO, 949 6 
910 02 930 1.44 Tre) £97, 
911 tO 2 oe 16.8 
912 ANS, 2 UG 
3 13} iOS oes res) 
914, .98 934 pela 
p15 5 os) oD ec 8 
016 0 926 1.44 
Lay, Lok4 ed Pes 
= 1s JE ELS: Ose I AS 
omg eee 939 ie02 
920 i 9 940 0 LE 


88 





e/2 
B/D 





Sai, 
|O Zo 30 40 OO 
Pre Gueare ~ (hz) 


FIGURE 27. Feature Separation Quality vs. Frequency, 


59 signals, 7 classes 


89 





2 
2 
DB / 0%, 


— / aw, 


Sr | ee. 
1O 20 30 40 50 
FREQUENCY (Hz) 


—_ 


FIGURE 28. Feature Separation Quality vs. Frequency, 
30 signals, 4 classes, no polynomial fa 
corrections ug 





those features whose ratio was less than one and ranking 
the best eleven of the remaining resulted in the feature set 
shown in rank order in Table VII. 

2. Multiple Feature Analysis 

To test the assumption that the parameters selected 

by single feature ranking were independent a utility routine 
PCORL was used to calculate their class and global correla- 
tion coefficients. Two dimensional scattergrams for selected 
pairs of highly ranked features were used for visual confir- 
mation of feature correlation and clustering. In the global 
case, two of the parameters exhibited correlation coefficients 
in excess of 0.9. TheSe parameters, whose scattergram is 
shown in Figures 29 and 30, are the nominal raster period and 
the Signal resolution. Correlation between these two 
parameters might be expected to be high if the sample Signal 
length corresponds to the same number of bauds for all signals. 
Both the scattergram and the class correlation coefficients 
Michicated that the period and signal duration remain highly 
correlated on a class basis. Other scattergrams such as that 
shown in Figures 31 and 32, for the next most effective single 
feature and the nominal raster period, gave little evidence 
of correlation. The results of these calculations and obser- 
meatrons for other feature pairs were sufficiently encouraging 
that single-feature distance ratios were used to choose the 


next four features to be discarded from the set of eleven. 
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The six features finally chosen were selected by 
repeated application of a minimum-normalized distance 
Seassification program CNFUS. Sixty-two signals from all 
classes except class four were classified by the program 


mecording to their distance from the class means. This dis- 


tance d. Gimamparticular set Of n Signal features, (xX, +-+-%) 
meen the class mean_ (las? was given by 
1) ee 
9 1 2 (x2 ree) 
aa 2) oan pa 
j=1 soe ) 


The Signals used for feature selection, their distance 
to the nearest four class means, and the confusion matrix 
resulting from the choice of the nearest class are shown in 
Tables VIII and IX. This choice of six features leads to 
approximately 56 percent agreement between ground truth data 
and the first choice of the classifier when the entire train- 
mieeset, including class four, is used. Wiis Ot meOurser 
represents an optimistic estimate of the probability of 
eOrrect classification. Further testing of the classifier 
Over an independent set of signal data 1S advisable. That 
the addition of features other than the basic clock rate does 
improve classifier performance is evident in the improvement 
of the 34 percent classification probability obtained using 
Smly this feature as the basis for classification. The 
resultant Single feature confusion matrix is shown in Table 
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TABLE IX 


COMBUGLON MATRIX FOR CLASSIFIER USING 
Peer ibebnos 202,.°903,°927, 926, 93k, Ls 


Gis ot TER GuOIcs 


# missed 
uy 2 g 4 5 6 f 8 9 Type I 
PaarOr 
a S = - a - - - - - 0 
Z Mi 2 - - - - - - - 1 
3 th - 7 5 - - - 2 - 8 
4 - - - - - - - - - 0 
5 - _ _ = 3 = - 1 - 1: 
Uu) 
03 
o 6 1 - - = - é = 1 - 2 
‘d 
8 i 6 = - = - - A - = 3 
8 I = = 1 = 1 - 4 - 3 
9 3 — I Zz - - - 1 3 7 
# of EG 0 a: 8 0 1 0 5 0 
excess 
ies Li 
Error 


GOOD = 37 MISSED 25 


59% correct classification over training set. 
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TABLE X 


CONFUSION MATRIX FOR CLASSIFIER USING CLOCK RATE ONLY 


CLASSIFIER CHOICE 


# missed ° 
ii 2 3 4 5 6 7 8 ss) Type I 
Brose 
Ai 8 = = = = ~ - - - 0 
2 - 3 - - = ~ ~ - - 0 
3 1 - 5 e - _ - 5 1 16 
ae sl lhe 
= 5 - - - 1 - 1 1 = - 3 
U) 
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S) 6 af - - 3 -- ak - ~ - 4 
é 7 - - 1 2 1 Z - - 4 10 
8 Z “ 1 = - 1 ~ 3 - 4 
9 - - 1 2 2 -- 3 1 i. 9 
# of 5 0 3 li 3 4 4 6 5 
excess 
ive LL 
EELOt 
treo. 21 + MISSED 41 


33.87% correct classification over training set. 
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The sequence by which the final six features were 
obtained involved first the calculation of the classifier 
confusion matrix as each of the four lowest ranked features 
were removed from the set of eleven. Their removal improved 
the classifier's performance from 50 percent to 58 percent. 
From this set of seven features, the effect of removing one 
parameter and using the remaining six yielded the following 


classifier performance: 


Parameter Removed PeObability Cf Correct Classi ficatwes 
None 58 3% 
918 25% 
oon 53 & 
926 50 % 
903 Sis 
Oe, Soh 4 
901 59 
902 56 3% 


The above suggests that the best single feature to be removed 
at this stage is 901, the nominal raster period. That this 
remaining group of six features is not necessarily the 
optimal choice, can be shown by Seen that a choice of 
five features, including parameter 901 leads to a 61 percent 


probability of correct identification shown in Table XI. 
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TAB iE aor 
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payee Il 
Error 
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61.29% CORRECT CLASSIFICATION OVER TRAINING SET 
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VI.. SUMMARY OF CONCLUSIONS AND RECOMMENDATIONS 


It has not been the intent of this thesis to suggest 
feature measurements which can replace those currently used 
in the Parameter Encoder's source identification Scheme. The 
fact that the measurement process discussed in this thesis 
is currently unable to provide meaningful measurements for 
about 20% of the signal data attempted is but one reason for 
not adopting the indicated Fourier components and polynomial 
coefficients as standard features to be measured. 

These features however, constitute a hitherto unexploited 
source of raster pattern information. The use of four of 
these features in conjunction with the nominal raster period 
Or Signal clock period provides approximately 100% improve- 
ment in classifier capability over the separation performance 
of the nominal raster period alone. In view of the amount of 
overlap in the feature distributions for several classes, 
reflected in both the scattergrams and the distances from 
the individual signals to the neareSt class means, it is not 
advisable to use more than four or five such features. Neither 
is it advisable to draw too firm a conclusion as to the opti- 
mal choice of these new features. The 62 test and training 
Signals do not by any stretch of the imagination constitute 
a completely representative statistical base. 

It is rather Surprising that those terms intended to 


characterize the mean bias, variance measSures and transient 
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phenomena show as little single-feature separating capability 
as they do, Since it was precisely these phenomena which have 
been used by system operators aS viSual aids to signal iden- 
tification. While one might reasonably associate the zero 
order polynomial mean square fit coefficients and the zero-Hz 
Fourier component with transient related effects, the higher- 
Order Fourier series terms have not previously been noted as 
Pareicularly prominent in the raster signal display. In 

view of the series of nonlinearities present in obtaining the 
interpolated minimal phase raster points, it is not impossible 
that these higher order terms owe their separating capability 
to the data modulation of the signal source. In this respect 
the "Sideband" structure of the single feature Separation 
Beata Of Figure 27 is most intriguing. 

Before completely discounting the uSefulness of bias 
measures and transient phenomena, a more careful represen- 
tation and analysis of the data in the translent region should 
be attempted. The representation could perhaps involve the 
calculation of the minimal phase by use of a moving~average 
technique which works backward from data in the steady-state 
region to the starting time of the Signal. At the same time 
use Of a different decay constant and a smoother integration 
to obtain the Laguerre polynomial Beer leients might also 
prove of value. 

If additional accurate ground truth data is available, 


the modeling of the probability distributions by something 
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other than a Gaussian form could be useful. The average 
squared magnitude of the Fourier components might be more 
accurately represented by a Rayleigh distribution. Although 
some of the measured parameters for one or more classes may 
not have a unimodal distribution, the number of accurately 
labeled samples currently available provides little encour- 
agement for the use of Parzen window probability estimators. 

The foregoing comments suggest that four or five parameters 
characterizing the raster scan data may provide signal iden- 
tifying information useful to about the same Gegree of accura- 
cy aS visual interpretation of the raster scan display. Since 
the optimal parameters measured in this investigation do not 
appear to correspond to the visual Stee used by system opera- 
tors, one has reason to believe that other parameters 
derived in subsequent analysis might improve upon this 
Weamee Of accuracy. 

The measurement technique investigated exists outside the 
Mainstream of the Parameter Encoder's indentification process. 
As Such it constitutes a relatively slow procedure (one minute 
per Signal), requiring the use of disc data files which con- 
tain much of the same information available directly from 
the master file structure. Even should the programs used 
be modified to work solely with the master file, the data 
obtained might be of greatest use in a Sequential identifica- 
tion scheme. It could be used much as the present raster 
scan display is; to resolve ambiguities in the choice between 


a relatively small number of classes. 
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APPENDIX A 


METHODS OF PROBABILITY ESTIMATION 


1. Parametric Techniques 
Perhaps the most common technique which has been used 
feminter the a priori probability structure presupposes that 
the distribution of features in a given class iS multivariate 
normal. As a first Step the mean and variance for each feature 


of a class are estimated. For the class C; having samples 





il , 
(fi ... ), these estimates may be: 
Ri 
ye doy ned 
i j=l 7 
SF 
any al Gey 2 
O = = ee ce ) A-2 
k K; J 3-1 jk k 
where Thi and . are components of ue and goo The 


variance vector represents a simplification from the more 


general multivariate form: 


: 1 i Tole (a) - 
p(x|C;) = 216] exp[-5(x - uw) S, (x - pe) ] eee 
aa 
Z{x) 


The components of the vector o are the diagonal elements 


of the d x a covariance matrix, which may be estimated by: 


2. ; 
= - 1 2 (xf? ee - 


S ay 
sak Koel j=l 7 | maid, ao 
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Under the assumption of independent features, the 
Gistribution is characterized by the product of d Gaussian 


gaseributions. 


ah = 


d 
ss i 1 Daa? 
p(x|C,) = i era Wa exp [-= meer (xp ~ py) ] jae) 
(21) “of? . 
K 


If required, the technique may be extended to include the 
estimation of a covariance matrix for each class by uSe of 
Equations A-3 and A-4. This procedure has enjoyed great 
popularity, particularly in the modeling of features with 
imeamodal class conditional probability distributions. MThe 
ease with which the probability measure of a Gaussian distri- 
bution may be interpreted as a weighted distance measure 
has been an important factor in its retention. 
2 Non-Parametric Techniques | 

Another technique technique for obtaining the a- 
perert probability density function involves its direct 
estimation through the use of window functions or potential 
functions. In this technique, the estimate at some point x 
in the d-dimensional feature space is obtained by the super- 
position of terms which are generalized distance functions 
of that point and the training set points ya of the class 


C.. 
1 


1 (i) 
y Y (XeX5) A= 6 


ii 
Pc x. 
J i 


- J 
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The technique arises from a viewpoint proposed by 
Parzen. In one-dimensional cases, one can estimate the 
probability density at some point x by counting the number 
of samples Nf) in ene Meenval [(xth,xtn] and dividing 
by the product of the total number of samples M and the 
interval width 2h 


A N, G3) 
Py OO = Rm a 


If one defines a "window" function 





a 
sr |y| <1 
K(y) = A-8 
Oye e: 
One can write 
M > Gah ae 
a | j zs 
j=l 
ims corresponds to the potential function 
_ 1 >, ies Y =a 
eK (a) ce 


Additionally, the argument presented may be extended to 


multiple dimensions by considering 


nmehs 





2 





ye aes |e for. j == /m 
K(u) = ie 
O, otherwise 
1 ad 
Ce he A (11-12) 
m m 


where Mew 1S a multidimensional volume of the hypercube of 
Side h_. 
m 
The window function K need not be the exact form of 
Equation A-11. It can be shown that if the estimate B,, (x) 
is a function both of the total number of samples M and 


Le = Vif) the conditions 


A(13-16) 
lim V = 0 


Moo 


lim MV =.° 


M0 


ensure that as the number of total samples M7 


lim 6, (x) = p(x) A-17 
M>-c 


Laie (Dy, (x) = p(x))? = 0 A-18 
ul om a =< 
M00 


ate 





Thus, we can consider Py (X) EO De a. blimred, 5Or “mols, | 
version of p(x) as seen through an averaging window. 
Intuitively, One can describe the following useful 


properties of the potential function: y(x,y) 


1. y(x,y) should be maximum for x = y. 
ferme y\x,y) Should be approximately zero for x distant 
from y. 
3. y(x,y) Should be continuous and decrease monotonically 
with the distance between y and x. 
ae ie Ubi 1A Sf oP ¥(X5-Y) where y is a sample point, the 


patterns represented by x, and x, Should have about 


Eno same Similarity fo y. 


Such properties can be obtained with a potential function 


Sretene form: 





2 
a (x. - y.) 
x,y) = ag exp[- > 2 = 5 a4 A-19 


where Oo; are arbitrarily chosen values. Although the form 
here is Gaussian, and the function of a diStance meaSure, 
it is also possible to construct a potential function of 


the form: 
“I 2 
SGV) Ae Oo; Gx) 6; Cy) A-20 


where {$.} is a complete set of multivariate orthonormal 


ae 





functions and hs are constants. Such techniques have been 
described more completely in the work of Aizermann and 
Bravermann and others [1l, 7, 19]. 

Given sufficient samples, the Parzen window or 
potential function approach essentially assures satisfactory 
convergence to an arbitrarily complex distribution. Unfor- 
tunately, this sufficient number of samples may be far greater 
than the number required if the form of the distributions 
was known. Since every sample point is used in the construc- 
tion of the density function, the above approach affords 
little economy in the way of data reduction and leads to 
a demand for computation time and storage space exponentially 


increasing with the number of features. 
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APPENDIX B 
CLASSIFICATION ALGORITHM 
fmeeoistance Based Classification Schemes 
An asSumption commonly used to implement a decision 
rule is that of multivariate normal class dependent density 


memetions of the form: 


a 1 
p(x|C,) = P(C;)——a7z— expl- glen 1S; [xp 
2m |S. | 


il 1 


where 


capo Chem G-cOMpOnent feature vector 
Th 1s the d-component class mean vector 


S. Boece TG Class covariance matrix 


|S, | is the determinant of S,. 

For such a distribution one may obtain the locus of 

points of constant density as a hyperellipsoid for which 
(i))T sot (x-p\*)) 


meemauadratic Surface (xp is constant. 


The quantity 


3 





is sometimes called the Mahalanobis @Gistance from GO yt) 
Since the probability densities are always non- 


negative, the maximum likelihood criteria 


p(C,)p(C, |x) o p(C;)p(C; |x) £61 a lg B-3 
implies 
log p(C,)p(C,; |x) > log p(c,)p(c, |x) B-4 


Substituting the multivariate density from Equation 
rae 
1, the above expresSion yields 


log p(c,) + Nog NEB ~ Six-y) six ] 


iy. 7-1 
ae Ha {log p(C,) - log V, - Peas S. 


(4) 
x-U S$. [x-y 


eae 


Ba 


NO) te 


The above equation defines the decision boundaries 
of each class. If a common covariance matrix S, = S, ae) 
is assumed for each class in the above equation, the log Vv; 


terms may be ignored and the form of the decision boundary 


becomes 
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Similarly, if equiprobable, a-priori class proba- 


bilities are assumed, p(C,) = p(C.) then the decision rule 
becomes 
11) txeu ys px) mint tx Dstt) Be7 


This is equivalent to simply assigning the feature vector 
Eero cidat Class whose pattern, 1 is at the minimum 
Mahalanobis distance from x. The concept of Mahalanobis 
distance 1S particularly applicable to well-separated 
unimodal feature distributions, but presents a considerable 
problem in those cases where the distributions exhibit 
more than one local maximun. 

2. K - Nearest Neighbor Algorithms 

The concept of distance may be applied in a slightly 

different form to probability functions not well characterized 
by the multivariate normal form. In this application, dis- 
tances in feature space are quite frequently normalized by 


some form of global variance or maximal spread. The nor- 


memezead Huclidean distance from the sample point x to each 


point of the training set co ae ee...) as 
=—l “Ky al = 
Calculated and ranked in order of increaSing distance. In 


the simplest algorithm, class membership of the sample point 
1s determined by the class membership of the greatest number 
of the k-nearest neighbors in the training set. For example, 


Suppose that of the ten pecints of the training set nearest 


SEES 





to the test sample, three were members in class 1, four of 
class 2 and one each were members of classes 4, 5, 6, the 
Simplest algorithm would choose class 2 as the probable 
classification. Alternative algorithms of this type weight 
the contribution of the k-nearest training points according 
to their distance ranking. The nearest point might be 
asSigned a weighted class vote of 10, the next neareSt, a 
weight of 9 and so forth. Class membership of the sample 
point would then be determined by summing the weighted class 
votes of the ten nearest neighbors and chooSing that class 
which received the maximum number of votes. 

Such procedures are reminiscent of the Parzen window 
estimators deScribed in Appendix A.2, and in fact can be 
shown to be equivalent if, instead of rank order, distance 
weighted contributions to the class vote are allowed. Numer- 
ous technigues [3, 4, 10] exist to increase the computational 
efficiency of these K-nearest neighbor algorithms by reducing 
the number of Euclidean distance terms which must be 
calculated for each sample point. It must be mentioned 
that these techniques are of greatest use when the individual 
classes are equiprobable or at the very least, when every 


class of the training set has more than k elements. 
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APPENDIX C 


FEATURE SELECTION AND RANKING 


1. Linear Combinations of Features 
The objectives of low dimensionality and retention 
of sufficient information may often be obtained through the 
use of custom orthnormal transformations of the meaSurement 
space [13, 15, 19, 20]. In Such an interpretation the sample 
data 1S presumed to be adequately represented by the linear 


Somoinatilon of some finite set of functions $4, -++ 4, 


where 


Oy = (bie eer bun) 


For a given set of functions the weights are adjusted so 


that the average mean square error 


2 


lms 


2 ae: 


1 n i ais) 


k il 


is minimized. 
The values a, are thus determined by taking partial 
derivatives with respect to as and setting the result equal 


to zero, Obtaining n equations in n unknowns of the fort 
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Now if the functions 6 -o go are orthonormal; that is if 


=] 
m 0 iF jj 
Lobe wo. = C-4 
ee 1k "Jk 1 foes 
Equation C-3 reduces to 
m 
a, = ie Zy 5 C=> 


Then values of a, may Pic ecw Uccece as Cie sect Lc ism oP ac cms 


representation of the vector z under the linear transformation 


x 
i] 
[4 
iN 
(o) 
) 
OY 


T = [ty sbor---r bn] 


Expansion of a function by a Fourier series represents one 
form of orthonormal transformation, but one which has 
distinct advantages in computational efficiency. 

It can be shown that Karhunen-Louve expansion of 
N m-dimensional samples Zz, = (ZepreserZs) leads to am xm 


eigenvalue problem. 
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where 


The above solution does in fact minimize mean Square error 
averaged over the entire sample set. By choosing eigen- 
vectors corresponding to the n largest eigenvalues, one can 
achieve dimensionality reduction from m to n. This approach, 
also known as principal factor analysis, requires the compu- 
tation of the eigenvectors and eigenvalues of a large (m x m) 
Matrix. Note that if the number of signals N is less than 

m, there will be at most (N-1) non-zero eigenvalues, and the 
minimum mean square error iS zero. 

A simpler procedure 1s to choose a small number of 
linearly independent patterns typical of the different classes 
or types of samples; then apply the Gram-Schmidt orthonormali- 
zation process to them directly. In this process one begins 


With the "typical" samples, perhaps derived from class means. 


(1) (1) (1) 
u = (uy eee HO) 
wl) a Qyl2) 2, yl), 
Tho The e ee res 


ney 





and constructs 


S 
mie OrTrtEnOonerma l functions 9. in the following 


manner: 
age) 
Wy —~ 
i 
= (  & 
a (2) 
eT Oy te) 
meee 2) 
gy ogy oi 
- (k) 
bh = US - 
LU yeti ee RRA OSE ye eal 
i 
2 
where 
7 m 
(B51 Be) = BIG KG Cte 
i=l 
Clearly, all of the samples used to generate the orthonormal 


vectors may be represented exactly by an n-term expansion; 


thus the representation error of the 


minimizec. 


If these samples are truly typical, 


'typical' sample is 


One can 
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expect other samples to be represented closely by the 
orthonormal basis functions. 

The dimensionality of the resultant pattern space 
is the same as the number of linearily independent typical 
samples. 

2. Criteria for Feature Selection and Ranking 

Rather than achieving a pattern space through use 
of a transformation which minimizes mean square error Over 
the training set, it is often desirable to optimize some 
measure Of separation quality. Meisel [14] catalogs more 
than ten different measures of separation quality which are 
Girectly calculable from general a priori conditional density 
functions. For unimodal distributions, however, it is 
convenient to describe such a meaSure in terms of average 
 @istances in the feature space. 

Two distance Pane need to be considered. One, 
the inter-class distance, might be characterized by either 
the distance beteen the means of the classes or the average 
squared distances between points of different classes. 
Another, the intra-class distance, is the measure of the 
internal scatter of samples and may be characterized by an 
average squared distance between the points of each class 
and summed over the classes. In the two-class problem, a 
common definition of the interclass distance, S, and intra- 


class distance, Ry. is 


ilyaak 
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2 , ; 
where d (x,y) 1S some distance measure of the vector 
variables x,y 

Conventional distance measures might use one of 


the following forms: 


2 Ss 2 
Buclidian ay Ck aa) = 2 (x, = y;) 
i=l 
n 
city block d, (x,y) = iz [xs - Ya 
1=1 
Ty7l 
Mahalanobis d, (x,y) = (x - y) ~ (x - y) 
* 
localized qd, (x,y) = 1 - exp[- a ds * (x,y) ] 6 Ola ia) 


distance 


In generalizing this distance criteria to a m-class problem, 
it is cOnvenient to resort to the concept of scatter matrices 


where 


i . 
os = 2 (xX, am wr) (4, a he CHS 
k=l 
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represent the individual class scatter matrices and the 





collective within-class Scatter matrices respectively. If 
one defines a total scatter matrix Sip in terms of a global 
mean M 
m . 
=a (ar) 
Mag DK c-20 
alee al 
a 
Als 
Sp = % (x — M(x - M) Cae 
7 all samples 
we find that 
m , ' m : 
1 a ae a 1 1 T 
pee Be eM ye Te ca) - ma = 
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provides a natural definition for both inter-set and intra- 


set scatter matrices: 


m 
s,= = Ss, Coos 
~ 1 
m ‘ - 
eee ese = uM)” Ca28 


These three matrices not only provide the basis for multiple 


PrIsceImMindie chalvetculs, 14, 18}, but their trace or 


lig ee 





determinant can also serve aS a meaSure Of Separability for 
use in an algorithm involving either linear combinations 

of features or discrete feature Selection. For example, 

as criteria for separability, one might choose to maximize 


mie quality funetion 


Q1 “ee Se ast C-25 


Sento minimize one of the quality functions 
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All of the above criteria are particularly interesting in 
that they can be evaluated as a reSult of multiple discrim- 
inant analysis uSing linear combinations of features. 

For a c-class problem we can accomplish the pro- 
jection of the d-dimensional feature ae Onto some alterna- 
tive space of not more than c-1l dimensions through the use 


Of a non-unique a x (c-l) matrix W. 


It iS straightforward problem to show that the resultant 


scatter matrices Siar S 


5 in this projection Space are 
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oT 

sig) WY ae y C-29 


If one then chooses W to maximize the ratio of between-to 


within-class scatter matrix determinants, 





IS. | \we Ss, W| 
Q, = = = a — ~ C-30 
jis} lw sw 
~W ey 
there 1S a solution of the form 
Sn Wa 4 Boe wae C-31 


If S., is non-Singular one can directly solve an eigenvalue 
problem related in form to the Karhunen-Louve problen 
Equation C-7. 


oak 


(Ss. S,) W. = ir, WwW. C=372 
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Or one may then solve the characteristic polynomial 


|S wes 


for the largest eigenvalues and find the corresponding 


eigenvectors from 
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oes d. S.,) w. = 0 C-34 


cnt 


Liem-Loteehne Veetors qu? = (ware ls nea, 
independent, there will be not more than m-1l non-zero eigen- 
values for this problem. If the within-class scatter 
matrix is isotropic, which is the case if all features are 
independent and suitably normalized, the resultant eigen- 
CO 


— 


vectors span the space defined by the vector set (yu 
and may be eStablished by Gram-Schmidt orthogonalization. 
The eigenvalues of this problem serve to characterize the 
trace and determinant criteria. For example, the use of 


all features leads to the criteria values 
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As waS previously mentioned, the criteria described 
here are best suited to unimodal distributions and increase 
the theoretically achievable error rate. In those cases 
where it can be established that a unimodal distribution is 
inappropriate, the use of separability measures more closely 
related to the estimated conditional probability density is 


indicated [16]. For example one might uSe a quality measure: 


WAS 
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Q,= 5 2 (PPCE(x,)) - E tplcy) pie (x3) |e,) 1° C~38 


oF 
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where N is the total number of training set samples and 


iene) repreSents some Selection of features from the initial 


feature space and 


img 


Sex) = 


= plc.) p(x]c,) Gate 
a 


a 
Or. 1s in fact a measure of Overlap and has an optimal value 
ee Zero, 
3. Feature Search Algorithms 

A cursory inspection of the problem of finding those 
n of m features which optimize one of the criteria given in 
the previous section shows that 

. . 

ere tet ae 
evaluations must be performed if one iS to conduct an 
exhauStive search of all possible feature combinations. The 
exhaustive search for the best ten or twenty features for 
example, requires consideration of more than 184,000 possibil- 
ities. Typically, the practical feature search employs 
Suboptimal procedures that may be justified if simplifying 
assumptions regarding ne nature of the cane distributions 


are allowed. 
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The simplest suboptimal search procedure is to assume 
the independence of features and evaluate the desired 
criteria function for all single features. These m features 
are then ranked in order of decreasing optimality and the 
first n are chosen. The number of criteria evaluations is 
the same as the dimension of initial feature space. 

A second search method used by Mucciardi and Gose 
[17] involves a technique known as search without replacement. 


This method requires Ne criteria evaluations where 


Ne = beat | C-41 
Fu [9] has proposed a technique of sequential feature 

pees ion which can be modified to include the cost of 
feature meaSurement and Chang [3] has developed alternative 
aynamic programming approaches to feature selection, one of 
which reguires only a slightly greater number of evaluations 
N,. than the search without replacement algorithm. 


E 


= Pp ee - _ " 
Ny = n(m 5 ) + n-2(m-1) C-42) 





The single feature ranking technique has the 
disadvantage of ignoring the effects of feature correlations. 
The search without replacement technique provides a method 
for treating these correlations, but it also presumes that 


the features obtained through n stages of conditional single 


V2 





feature evaluation procedure are the n best features, which 
need not be the case. In general, aS One progresses to 
succesSively more complicated search procedures, more of 


the effects of feature dependence may be taken into account. 
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